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Estimating and assessing the risk of a large portfolio is an important topic in finan- 
P_{ cial econometrics and risk management. The risk is often estimated by a substitution 

of a good estimator of the volatility matrix. However, the accuracy of such a risk esti- 
d mator for large portfolios is largely unknown, and a simple inequality in the previous 

, literature gives an infeasible upper bound for the estimation error. In addition, numer- 

^ ical studies illustrate that this upper bound is very crude. In this paper, we propose 

^ factor-based risk estimators under a large amount of assets, and introduce a high- 

^ confidence level upper bound (H-CLUB) to assess the accuracy of the risk estimation. 

The H-CLUB is constructed based on three different estimates of the volatility matrix: 
sample covariance, approximate factor model with known factors, and unknown factors 
(POET, Fan, Liao and Mincheva, 2013). For the first time in the literature, wc derive 
y—i the limiting distribution of the estimated risks in high dimensionality. Our numeri- 

>• cal results demonstrate that the proposed upper bounds significantly outperform the 

^ traditional crude bounds, and provide insightful assessment of the estimation of the 

5^ portfolio risks. In addition, our simulated results quantify the relative error in the risk 

estimation, which is usually negligible using 3-month daily data. Finally, the proposed 
methods are applied to an empirical study. 
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1 Introduction 



The potential of a portfolio's loss is termed as the portfolio risk. There are two types of 
portfolio risks. The systematic risk (or market risk) is the risk inherent to the entire market, 
such as risk associated with interest rates, currencies, recession, war and political instability, 
etc. The systematic risk cannot be diversified away, even with a well-diversified portfolio. 
In contrast, specific risk (or idiosyncratic risk) refers to the risk that affects a very specific 
group of securities or even an individual security. For example, it can be the risk of price 
changes due to the unique circumstances of a specific stock. Unlike systematic risk, specific 
risk can be reduced through diversification. 

Estimating and assessing the risk of a large portfolio is an important topic in financial 
econometrics and risk management. The risk of a given portfolio allocation vector w is 
conveniently measured by (w'Sw)i/^ in which S is a volatility (covariance) matrix of the 
assets' returns. Often multiple portfolio risks are at interests and hence it is essential to 
estimate the volatility matrix S. The problem becomes challenging when the portfolio size 
is large. Suppose we have created a portfolio from two thousand assets and invested in a 
part of selected assets. The covariance matrix S involved then contains over two million 
unknown parameters. Yet, the sample size based on one year's daily data is around 252. It 
is hard to assess the estimation accuracy when the estimation errors from more than two 
million parameters are aggregated. Hence some regularization method is recommended to 
estimate and assess risks. 

The interest on large portfolios surges recently. Pesaran and Zaffaroni (2008) examined 
the asymptotic behavior of the portfolio weights. Brodie et al. (2009) addressed the problem 
of portfolio selection using a regularization penalty. Gomez and Gallon (2011) numerically 
compared several methods of covariance matrix estimation for portfolio management. In 
particular, the optimal portfolio selection involves inverting an estimated S, which is a 
challenging problem under a large number of assets. The literature is also found in Jacquier 
and Poison (2010), Antoine (2011), Chang and Tsay (2010), DeMiguel et al. (2009), Ledoit 
and Wolf (2003), El Karoui (2010), Lai et al. (2011), Bannouh et al. (2012), Gandy and 
Veraart (2012), Bianchi and Carvalho (2011), among others. 

This paper contributes to the literature in at least four aspects. First of all, we propose 
risk estimators based on factor analysis. Traditionally S is estimated by the sample covari- 
ance. However, when the number of assets is larger than the sample size, it is well known 
that the sample covariance is singular, which may result in an estimated risk being zero for 
certain portfolios. By assuming a factor structure on the returns, we obtain strictly positive 
definite covariance estimators S even when the number of assets is larger than the sample 
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size. Two factor-based methods are proposed. The first estimator assumes the factors to be 
known and observable. The second method deals with the case in which common factors are 
unknown. This is particularly important for analyzing many non-U. S. markets when assets' 
returns are driven by a few unknown factors. In both cases, the factor model imposes a 
conditionally sparse structure, in that the idiosyncratic covariance is a large sparse matrix. 
This yields to an approximate factor model as in Chamberlain and Rothschild (1983), with 
a non-diagonal error covariance matrix. 

Secondly, we provide a new and practical method to assess the accuracy of risk estimation 
w'(S — S)w. In the literature (e.g.. Fan et al. 2012), this term has been bounded by 

Ct — ||w||i||S — S||niax 

where ||w||i, the Li-norm of w, is the gross exposure of the portfolio, which is bounded when 
there arc no extreme positions in the portfolio. However, this upper bound depends on the 
unknown S, hence is not applicable in practice. In addition, the numerical studies in this 
paper demonstrate that this upper bound is too crude: it is often of the same or even larger 
scale than the estimated risk. In contrast, we provide a high-confidence level upper bound 
(H-CLUB) for w'(S — S)w, which is of much smaller scale and easy to compute in practice. 
H-CLUB is constructed based on the confidence interval for the true risk. For each proposed 
risk estimator w'Sw and a given r e (0, 1), we find an H-CLUB U (r) such that 

P(|w'(S - S)w| < U{t)) ^ 1 - t. 

In contrast, P(|w'(S — S)w| < ^t) = 1- Hence H-CLUB is an upper bound for the risk 
estimation error with high confidence while the traditional bound is of full confidence. 

The third contribution is that for the first time in the literature, we derive the inferential 
theory of the risk estimators with a high-dimensional portfolio, especially when the estimator 
is factor-based with either observed or unobserved factors. Although factor analysis has long 
been used for the portfolio allocation theory, it remains largely unknown whether the effects 
of estimating the factor loadings and unobservable factors are negligible in risk estimation, 
especially when the dimensionality is high. This paper proves that these effects are indeed 
asymptotically negligible for diversified portfolios, even when the dimensionality is much 
larger than the sample size. Interestingly, we find that when the dimensionality is larger 
than the sample size, the factor-based risk estimators have the same asymptotic variances 
no matter whether the factors are known or not, and they are asymptotically equivalent. 
Hence the high dimensionality is in fact a bless for risk estimation instead of a curse from 
this point of view. In addition, the asymptotic variance of factor-based estimators is slightly 
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smaller than that of the sample covariance-based estimator, but the difference is small. This 
demonstrates that the benefit of using a factor model is not in terms of a much smaller 
asymptotic variance, because the systematic risk cannot be diversified. Rather, factor anal- 
ysis gives a strictly positive definite covariance estimator, which is essential to estimate the 
optimal portfolio allocation vector, and also interprets the structure of the portfolio risks. 

Finally, using our simulated results based on the model calibrated from the real U.S. 
equity market data, we are able to quantify the relative error of the estimation error or 
coefficient of variation, defined as STD(w'Sw)/w'Sw, where STD(-) denotes the standard 
error of the estimated risk. Interestingly, this ratio is just a few percent and is approximately 
independent of the gross exposure ||w||i but sensitive to the length of the time series. On 
the other hand, we also quantify the relation between the crude bound and the practical 
H-CLUB. We find that is many times larger than U{t), and the ratio ^t/U{t) increases 
as the gross exposure increases. 

We also contribute to the portfolio theory by introducing a sampling technique which 
picks a random portfolio with a given gross exposure level. This sampling scheme can be 
useful for portfolio optimization and understanding the overall risks within a given level of 
gross exposure. 

We emphasize that the recent works by Fan et al. (2011, 2013) are only concerned about 
covariance estimations and no inferential theories were studied. In contrast, we focus on 
the risk estimation, with a particular attention to the risk assessment and the impact of 
covariance estimation on the limiting distributions of risk estimators. 

The rest of the paper is organized as follows. Section 2 proposes new risk estimators 
based on factor analysis under both known and unknown factor cases. Section 3 constructs 
the H-CLUB for each risk estimator based on the confidence interval for risks. It also derives 
the limiting distributions of the risk estimators and compares their asymptotic variances. 
Section |4] presents simulation results. An empirical study is considered in Section 5. Finally, 
Section 6 concludes. All the proofs are given in the appendix. 

Throughout the paper, ||w||i = used to denote the gross exposure of a 

given portfolio allocation vector. For a square matrix A, Amin(A) and Amax(A) represent its 
minimum and maximum eigenvalues. Let || A||max and || A|| denote its element- wise sup-norm 

1 /2 

and operator norm, given by ||A||niax = niaxj ,,• \Aij\ and ||A|| = Amax(A'A) respectively. 

2 Estimation of Portfolio's Risks 

Let{Rt}^]^ be a time series of an x 1 vector of observed asset returns and S = cov(Rt), 
often known as the volatility matrix. The portfolio risk of a given allocation vector w is 
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given by •\/var(w'Rt), which is Vw'Sw. How to estimate the risk of a large portfoho? A 
straightforward answer is a/ w'Sw with an estimator S. But, how good is it? How to assess 
the accuracy of this estimator? We address the problem or risk estimation in this section. 
The assessment of the estimation accuracy will be discussed in Section 3. 

The problem of estimating the risk of a given portfolio is challenging due to the high 
dimensionality of S. Often the number of assets can be of hundreds or even thousands. On 
the other hand, to adapt to the current market condition, a short period of financial data are 
often used. For example, the number of daily returns in three months is only of tens. Hence 
N can be much larger than T. We assume S to be time-invariant within a short period, 
which holds approximately for locally stationary time series. 

We consider three estimators for estimating var(w'Rt) for a given w, based on three 
different estimators S: sample covariance estimator, factor analysis with either observed or 
unobserved factors. Recently, Chang and Tsay (2010) proposed a Cholesky decomposition 
approach to estimate the large covariance matrix, and used simulation to assess its perfor- 
mance. On the other hand, the assets' returns are usually driven by a few market factors. 
Due to the presence of these common factors, S itself is not sparse. Moreover, as pointed out 
by Stock and Watson (2002) and Bai (2003), common factors are usually pervasive, so the 
factor loading matrix is not sparse either. Hence the factor-based risk estimators are widely 
applicable in analyzing financial data, whose asymptotic properties (as both T,N ^ oo) will 
be also presented below. 

2.1 Sample-covariance-based estimator 

The first estimator S = S is the conventional sample covariance matrix based on {R,t}J=i- 
The asymptotic impact of using S on the risk management has been studied by Fan et al. 
(2008, 2012) when is much larger than T. The sample covariance estimator does not 
require any structural assumption on the assets' returns. It was shown by the aforementioned 
authors that for a given portfolio w with a bounded gross exposure (that is, ||w||i is bounded), 

w'(S-S)w = 0,(y^). 

However, when > T, it is well known that S is singular, and therefore may result in an 
estimated risk being zero for certain portfolios. 
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2.2 Estimating risks based on factor analysis 

To overcome the problem of singularity of the sample covariance under high dimensional- 
ity, we assume that Rj satisfies an "approximate factor model" (Chamberlain and Rothschild 
1983): 

Rt = Bit + Ut,t<T, (2.1) 

where 'B is a,n N x K matrix of factor loadings; is a X x 1 vector of common factors, and 
Ut is an N X 1 vector of idiosyncratic error components. In contrast to N and T, here K 
is assumed to be fixed. The common factors may or may not be observable. For example, 
Fama and French (1992, 1993) identified three known factors that have successfully described 
the U.S. stock market. In addition, macroeconomic and financial market variables have been 
thought to capture systematic risks as observable factors. On the other hand, in an empirical 
study, Bai and Ng (2002) determined two unobservable factors for stocks traded on the New 
York Stock Exchange during 1994-1998. 

Let cov(ft) and = cov(ut) denote the covariance matrices of ft and Ut, K x K and 
N X N respectively. Suppose and Ut are uncorrected. The factor model then imphes the 
following decomposition of S: 

S = Bcov(ft)B' + S„. 

Sparsity is one of the most common structures for large covariance estimation, which as- 
sumes many off-diagonal elements of the covariance to be either zero or nearly so. In the 
approximate factor model, a natural assumption is to place a sparse structure on S^. The 
rationale is, after the common factors are taken out, the remaining idiosyncratic components 
should be mostly weakly correlated with each other. Such a condition is called conditional 
sparsity. We now propose new risk estimators based on the conditional sparsity assumption. 

2.2.1 Factor-based estimator 

We first assume that the common factors are observable, and construct an estimator of 
S based on thresholding on the covariance matrix of idiosyncratic errors. Suppose B is the 
least squares estimator of B. The residual sample covariance matrix of Uj is then given by 

T 

^u — T^ ^ UtU^ — {Su,ij)NxN, Ut = — Bfj. 
t=l 
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Let Sij{-) : M — 7- M be an entry-dependent adaptive thresholding function and for some 
thresholding parameter r--^- > 0, 

Sij{z) = when \z\ < t/j, and \sij{z) — z\ < t/j. (2.2) 

A simple example is that Sij{z) = zl{\z\ > r^) with r^- = r^Su^uSujjY^'^, namely, setting all 
correlation coefficients smaller than r to zero. This rule is called hard thresholding in the 
literature. The soft-thresholding rule is given by Sij{z) = {z — r/-)^. Let 
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Let cov(fi) denote the sample covariance of the common factors. Define the estimated 
covariance matrices as 

= BcOT(fOB' + S,, = {^u,ij)NxN. (2.3) 

The first condition of the thresholding function plays a role of thresholding. When applied 
to a sample covariance, it thresholds off most of the small entries that are likely due to 



the estimation errors. The second condition in (2.2) is used for "shrinkage", which helps to 
produce a positive definite covariance estimator for a given finite sample. Commonly used 
examples of Sij{-) include hard-thresholding, soft-thresholding, SCAD thresholding, etc. See 
Antoniadis and Fan (2001), Rothman et al. (2009) and Cai and Liu (2011) for details. 
The cut-off is taken to be, for some C > 0, 



f - n /Q — Q — \ogN 

which corresponds to applying the thresholding with parameter C \/log N/T to the correla- 
tion matrix of S^. One can adjust C to gain the strictly positive definiteness of Xl/ for any 
given finite sample (see the discussion in Fryzlewicz 2012). 



2.2.2 POET estimator 

When the common factors are unobservable, we estimated S by "principal orthogonal 
complements thresholding" (POET), recently proposed by Fan et al. (2013). The POET 
works as follows: Let Ai > ■ ■ ■ > Aa? be the ordered eigenvalues of the sample covariance S, 
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whose corresponding eigenvectors are denoted by {^j}^^^. We then estimate S by 



' I ^kC'^ i ~ j 



where Sy(-) is the same adaptive thresholding function as before, based on an entry- 
dependent threshold rfy. 



Recall that K denotes the number of common factors. Here C is a user-specified constant 
to maintain the finite sample positive definiteness. Thanks to the thresholding, even when 
T = o{N), there is C* > such that for any C > C* , both S/ and Sp are strictly positive 
definite with probability approaching to one. Simulated and empirical studies suggested that 
C = 0.5 is a good choice when Sij is the soft thresholding. 



Based on the factor analysis, our proposed risk estimator is either y w'S/w or y w'Spw 
for a given portfolio allocation vector w, depending on whether is observable. Note that 
Fan et al. (2013) is concerned only about the covariance estimation. In contrast, this 
paper focuses on the asymptotic behaviors of these risk estimators and their assessment 
for a given diversified w, which have never been addressed before. We will see that under 
high dimensionality, the factor-based estimators have the same asymptotic variance, and is 
smaller than that of the sample covariance- based estimator. The effect of estimating the 
unknown factors on the limiting distributions is asymptotically negligible. 



3 Assessment of the Risk Estimation 

This section proposes a new method to assess the estimated risks for a given portfolio 
allocation vector w. We will assume ||w||i < c for some c > 0, where ||w||i is the gross 
exposure of the portfolio. This prevents extreme positions. 



3.1 Measuring risks using full confidence bound 

As described in Section 2, we use a covariance estimator to form a risk estimator w'Sw. 
A natural question then arises: how close is the estimated risk to the true risk? In other 
words, how do we assess A = |w'(S — S)w|? Technically this question is challenging under 
high dimensionality. A simple inequality asA<||w|p||S — S|| would not give a convergence 
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upper bound when is large. An alternative (and commonly used) upper bound for A is 
based on the following inequality: 

A< ||w||2||S-S|Uax = eT, (3.1) 

which is usually tighter for risk assessment. 

However, for the purposes of statistical inference, C,t is infeasible as it depends on the 
true S. As a result, this upper bound cannot be evaluated in practice for a given data set. 
In addition, our simulation results have shown that the upper bound is actually too crude 
to be useful. Let us consider the following toy example. 

Example 3.1. Consider three stocks with annualized returns that jointly follow a multi- 
variate Gaussian distribution A^3(0, S) where S = 0.04 ■ I3. An equally weighted portfolio 
w = (1/3, 1/3, 1/3)' is constructed and the task is to estimate the portfolio risk using the 
sample covariance matrix S based on the simulated 21-day (one month) returns. 

The theoretical value of portfolio variance is w'Sw = 0.0133, which corresponds to a 
true risk of 11.55% per annum. Based on a typical simulated data, the estimated portfolio 
variance wSw = 0.0131, equivalent to a perceived risk of 11.43% per annum. Moreover 

= II S — S II max = 0.0248. Based on this upper bound, a simple calculation shows that 
w'Sw G [0,0.0379], that is, the true risk a/w'Sw lies in [0, 19.46%], an interval that is too 
wide to be meaningful. □ 



Note that the inequality (3.1) holds for every sampling sequence {Rt}^]^. Hence is in 



fact an upper bound of full confidence, that is, 

P(|w'(S -S)w| < ^t) = 1. 

The toy example is typical in the sense that C,t is already too crude for small portfolios. In 
statistical inference, often people use bounds of high confidence levels instead, e.g., quantities 
that bound A with a high probability. This paper pursues such a high-confidence-level upper 
bound (H-CLUB) based on the confidence interval. 



3.2 H-CLUB 

We propose a new confidence upper bound for A = |w'(S — S)w| to assess the estimation 
error of the portfolio risks. More specifically, for each proposed matrix estimator S and any 
given r > 0, we find a quantity U{t) such that for all large A^ and T, 

P(|w'(S - S)w| < U{t)) > 1 - r. 
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Therefore, U{t) is an asymptotic (1 — r)100% confidence upper bound for A. In addition, 
it is data-driven (up to user-specified tuning parameters), hence can be easily calculated in 
practice and used to construct confidence intervals for the true risks. 

Before proceeding, we make a technical comment that one needs to be careful about the 
limiting behaviors of T and A^. In this paper, we will treat as an increasing function of 
T. Hence N grows via a fixed trajectory, e.g., N = Nt = T" for some a > 0, and can be 
faster than T, namely, a > 1. As a result, we need to apply the triangular array central 
limit theorem with weakly dependent time series data. 



3.3 Sample covariance based risk estimator 

Let us start with the sample covariance matrix of R^. For simplicity and exposition, let 
us assume that the returns have mean zero and S = Ylt=i ^t^'t- make the following 
assumptions, under which the serial dependence across t is allowed. 

Assumption 3.1. (i) {Rt}f=i is strictly stationary with EHf = and cov(R() = S. 
(a) There is M > such that maxiKN E\Rit\^ < M. 

Let us introduce the strong mixing condition. Let J^'^^^R) and J^^{R) denote the a- 
algebras generated by {Rt : —oo < t < 0} and {Rt ■ T < t < oo} respectively. In addition, 
define the mixing coefficient aniT) = sup^gjro {R)^BeT!^{R) ~ 

Define the autoregressive function '~fT{h) = cov((w'Rf)^, {w'Ht^h)'^) , which depends on 
T through dim(w) = N = Nt- Let 

4 = 7t(0) + 2 5^7t(/i). (3.2) 

h=l 

Assumption 3.2. (i) There exists tq > and M > such that: for all T G Z"*", 

aniT) < exp(-MT'^o). 
(^V EZi Mh)\ = 0(1), ELi hiT{h)/T = 0(4) and an{T) = o(7t(0)). 



Assumption |3.2| requires the weak dependence of the time series. Strong mixing con- 
dition is assumed. The first two conditions in (ii) are usually mild. When a diversified 
w is used, the last condition in (ii) is easy to satisfy as long as the dimensionality is not 
exponentially large in T because the mixing coefficient is assumed to decay exponentially 
fast. To illustrate its meaning, consider a simple case where w = (1/A^, ■■■ ,1/A^), then 
7t(0) = var((-^ E^-^ i?jt)^), which is in general no smaller than 0{N^'^) for some c > 0. 
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Due to the strong mixing condition in Assumption |3.2[ i), q;_r(T) = 0(7^(0)) if = Nt 
grows at a polynomial rate of T. 

We are now ready to define the H-CLUB for the estimation error w'(S — S)w. Let 



T-h 



7(/i) = 5^((w'R,)2 - w'Sw)((w'Ri+,)2 - w'Sw). 



t=i 



In particular, 7(0) = T ^ Ylt=ii'^'^t)^ "~ (w'Sw)^. Let Zr/2 denote the upper r/2 quantile 
of the standard normal distribution. For some increasing sequence L = L{T) — )■ 00, let 



= 7(0) + 2 ^7(/i), Usir) = z^i^^d^. 



(3.3) 



h=\ 



Here L is a truncation parameter, and as L slowly increases, consistently estimates a\. 
Lemma 3.1. Under Assumptions 3.1 3.^ 



h>L 



If in addition = o(Ta^) and J^hyL'^W ~ ^i^r)' 



_ ^2 I ^ Op(cr^) and Us{t) = | 



The following theorem gives the limiting distribution of the estimated risk. It also demon- 
strates that Us{t) is a valid H-CLUB for |w'(S — S)w|. 



Theorem 3.1. Under the assumptions of Lemma \3.1\ as T,N ^ 00, 

1 -1/2 



var 



rw'(s-s)w A/'(o,i), 



and for any r > 0, 

P (^|w'(S - S)w| < Us{t)^ ^ 1 - r. 

As a result U{t) = 2:^/2 a/^V^ is an H-CLUB with confidence level (1 — r)100% and is 
data-driven once a user-specified L is determined. Compared to the traditional bound C,t, 
Us{t) can be easily calculated for any given time series data. The scale of Us{t) is much 
smaller than that of C,t- Our simulation results show that even for a small r (e.g., r = 0.01), 
the magnitude of U (r) is much smaller than the crude bound ^t- (See Tabled in Section Ul) 
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By the (5-method, we have the following corollary for the risk estimation. Define 



^(w) = Vw'Sw, R(w) = Vw'Sw. 



Corollary 3.1. Under the assumptions of Lemma 3.1, for any r > 0, as T,N ^ oo, 



P (|^(w) - R{w)\ < Us{t) /VAw'Sw^ ^ 1 - r. 

3.4 Factor-based risk estimator 

Let us now approach the problem via factor analysis. We assume 

R* = Bit + ut, (3.4) 

where in this section, {it}J=i are observed common factors. In the approximate factor model, 
the idiosyncratic covariance is non-diagonal. However, the risk component w'(Su — S„)w 
introduced by the idiosyncratic error can be diversified away by a selected portfolio allocation 
vector. Hence the estimation error of the risk only comes from the systematic error brought 
by the common factors. Compared to the sample covariance based risk estimator, factor 
analysis always gives strictly positive risk estimators even when > T for any nonzero 
allocation vector w. 

For the factor-based risk estimation, a different set of assumptions are needed instead of 
those in Section 3.3. First of all, the factor model is assumed to be conditionally sparse, in 
the sense that "S^ is a sparse matrix. We employ the approximate sparsity assumption in 
Bickel and Levina (2008) as follows: 

Assumption 3.3. There is q E [0, 1) such that 

N 

SN = max^ |S„,i,f = o(min{(r/logAr)(i-'?)/^iV(^-^)/2|^_ 
i=i 

When g = we define Sn = niaXj<Ar J2f=i H'^u,ij 7^ 0) as the maximum number of nonva- 
nishing elements in each row, and the assumption requires that = o((T/log A^)^/^, N^^"^). 



Assumption 3.3, though slightly stronger than those in Chamberlain and Rothschild (1983), 
is quite meaningful in practice. For example, when the idiosyncratic components represent 
firms' individual shocks, they are either uncorrelated or weakly correlated among the firms 
across different industries, because the industry specific components are not pervasive for 
the whole economy (Connor and Korajczyk 1993). 
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Assumption 3.4. (i) {ut,it}J^i is strictly stationary, {ut}J^-^ and {it}J=i are independent, 
and Euit = Efjt = for all i,j. 

(a) There exist ri, r2 > and &i, 62 > 0, such that for any s > 0, i < p and j , 



Pi\uu\ >s)< expi-is/hY^), P(|/,,| > s) < expi-is/h] 



r2) 



(Hi) There is C > such that C ^ < Amin(Su) < Amax(Su) < C, maxi^N ERf^ < C, 
||B||max < C and Amin(cov(f<)) > C~^. 

Let J-'^oo ^iid •^t' denote the cr-algebras generated by {(ft,ut) : —00 < t < 0} 
and {{it,Ut) ■ T < t < 00} respectively. Define tlie mixing coefficient OLf{T) = 
sup^.^o «,^c, \P{A)P{B) - P{AB)\. Let 



7/(/i) =cov((w'Bft)2,(w'Bf, 



t+h) 



for h > O.lt follows from the «-mixing condition that Xlh^i l7/(^)l — 0(1) (see Lemma B.5 
in the appendix). In addition, define 

00 

aJ = 7/(0) + 2 5^7/(^). (3.5) 

h=l 

Assumption 3.5. (i) There exists ra > and M > satisfying: for all T G Z+, 

a/(T) < exp(-MT''^). 

(^^) Er=i hlf{h)/T = o{aj) and af{T) = 0(7/(0)) asT,N ^ 00. 
Assumption 3.6. w'S„w = o(cr^ + a/T-9/2(iog]v)-(i-'?)/2s^i). 

Note that aj = 0(1). This assumption allows aj to decay as increases due to diversified 
allocation vectors. Recall that q is defined in Assumption |3.3[ Assumption |3.6| requires 
||w|| = 0(1), which assumes a diversified portfolio to reduce the idiosyncratic risk. To 
illustrate the intuition, consider the following simple example. 

Example 3.2. Consider a one-factor model on the asset returns: 

Rit = bift + Uit 

where var(/^^) > and S„ is a diagonal matrix. Hence g = and sn = 1. For simplicity, 
suppose {ft}^i are independent across t, and thus a'j = 7/(0) = (w'B)'^var(/j?). As the 
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eigenvalues of I]„ are bounded away from zero and infinity, Assumption |3.6| is equivalent to: 



w'w = o{{w'Bf + (w'B)VVlog N), 



(3.6) 



which holds if w is "diversified" enough. For example, the equal-weight allocation w = 
(1/A^, ■ ■ ■ , 1/A^) gives w'S„w = 0{N-^). Writing Cn = \N-^ ^f^^ bi\, then ^ holds as 
long as N~^y/\ogN = o{C%). This is true since Cn is often bounded away from zero. □ 



To construct H-CLUB, we need to first estimate 'jf{h). For cov(ft) 
define 



T-h 



jf{h) = T-^ ^[(w'Bft+/,)2 - w'BcOT(ft)B'w][(w'Bft)2 - w'BcOT(fi)B'w], 



t=i 



where B is the least squares estimator of B. For some L = L{T), define 

L 



(3.7) 



1=1 



Let (3 = 3r^' + 1.5r^' + r^. 



Lemma 3.2. Suppose {logNf^+^ = o{T) and L^{L + \ogN)/T + E/.>l7/(^) 
Under Assumptions \3.3 3. 



oiaj). 



^ L + logN ^ ,,,, 



h>L 



and 



Uf(r) 




\ogN 



T 



Hence the H-CLUB has a smaller stochastic order than that of the crude bound ^t- 
The following theorem shows that Uf{T) is a valid H-CLUB for the risk estimation error. 



and can be computed easily from the data in practice. Technically, Theorems 3.2 and 3.3 



(to be introduced in the next subsection) below are not simple applications of the triangular 
array central limit theorem. We need to show that after thresholding, the idiosyncratic risk 
can be diversified away by the portfolio vector w, and the estimation error for the factor 
loadings is asymptotically negligible even under high dimensionality. 

Theorem 3.2. Suppose that the common factors are observable, and that the thresholded 



(2.3) is used as the covariance estimator. Under the assumptions of Lemma 



3.2, as 
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T,N ^ oo, 



and for any r > 0, 



$^(w'Bf, 



-1/2 



Tw'(S/-S)w ^'^Ar(0, 1), 



P ( Iw'rSf - S)w| < Uf(T)] ^ 1 - r. 



Remark 3.1. Similar to Corollary 



3.1 



if we use -R/(w) = ww'S/w to estimate R{w) 



Vw'Sw, then applying a delta method gives 



P \Rf{w) - R{w)\ < t//(r)/v/4w'S/w ] ^ 1 - t. 



Hence [//(r)/y 4w'S/w is a valid H-CLUB for |i?/(w) - R{w)\. 

It is interesting to compare Uf{T) with Us{t) and see if knowing the factor structure 



results in a reduced upper bound. This is equivalent to comparing a in (3.3) with aj in 



(3.7). Essentially we are to compare the asymptotic variances of the estimated risks between 



a pure nonparametric risk estimator (sample covariance) and an estimator based on factor 
analysis. We will see in the following section that when the factor structure is specified, the 
factor-based risk estimator indeed gives a slightly smaller asymptotic variance. 



3.5 Risk estimation with unknown factors 

Often the market assets' returns are driven by a few unknown factors. Hence the common 
factors it may not be observable which makes the analysis more practical and challenging. 
In this case, we apply the POET estimator for Xl to handle the difficulty of not knowing the 
factors: 

K 

Sp = EWi + " (3-8) 

i=i 

with K being the number of common factors. For simplicity, we will assume K to be known, 
and in practice it can be estimated consistently using the BIG method (Bai and Ng 2002). 
Then K in the above estimator can be replaced with its consistent estimator. 
Under the conditional sparsity condition. Fan et al. (2011, 2013) showed that 



Or 



sn 



\ogN 1 

— \ 

T N 



l/2-g/2^ 



(3.9) 
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and when common factors are observable 



l/2-q/2^ 



^-^-^-^=0,\s^(^] ) (3.10) 



where q and sn are defined in Assumption 3.3 The term 1/A^ in (3.9) is the price for not 



knowing f^. When T = o(A^logA^), the above convergence rates are the same. Intuitively, as 
the dimensionality increases, more information about the common factors is collected, and 
eventually the common factors can be treated as though they are known. Moreover, both 
S p and S f are strictly positive definite for all large N and T. 

We will see that with large enough pool of assets and a diversified portfolio allocation, 
the effect of estimating the unknown factors on the estimated risk is negligible. As a result, 
w'Spw and w'S/w have the same asymptotic limiting distribution. For this purpose, we 
impose additional conditions. 

Assumption 3.7. As N ^ oo, the eigenvalues o/B'B/A^ are bounded away from both zero 
and infinity. 



Intuitively, Assumption |3.7| means that the common factors should be pervasive, that is, 
impact on a non-vanishing proportion of individual time series. It implies that the first K 
eigenvalues of Xl are growing with rate 0{N), which are well separated from the eigenvalues 
of S„. For identification, we assume cov(ft) = Ik and B'B to be diagonal. Consequently, 

S = BB' + 

Write B = (bi, ■ ■ ■ ,1)^)'- The following assumption is common in the literature of high- 
dimensional factor analysis, e.g., Bai and Ng (2002), Bai (2003). 

Assumption 3.8. There is M > such that EfA^^^/^j^u^ut - Eu'^ut)]"^ < M and 
E\\N-^/'^l,h,uur<M. 



Motivated by (3.9) and (3.10), we require T = o(A^logA^) so that the effect of estimating 
the common factors is first-order negligible. This is often true for the asset returns' time 
series data. In addition, the portfolio vector w should still be diversified enough. This leads 
to the following assumption: 

Assumption 3.9. ajT ^ oo, ajN/T ^ oo, and w'S^w = oLf^s-^^N^I'^-'i/'^T-^/'^). 



Assumption 3.9 can be verified similarly by an example like Example 3.2 
To define an H-CLUB for a factor model with unknown factors, we first apply the principal 
components method (Stock and Watson 2002 and Bai 2003) to estimate cr|. Let F = 
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(fi, ■ • • , fr) he a K xT matrix such that the rows of F/ V T are the eigenvectors corresponding 
to the K largest eigenvalues of the T xT matrix R'R, where R = (Ri, ■ ■ ■ ,Rr)- Let 
B = RF'/T. Define 



T-h 



^p(h) = T-^ ^[(w'Bft+;,)2 - w'BB'w][(w'Bfj2 - w'BB'w]. 

t=i 

For some L = L{T) oo, let 



al = 7p(0) + 2 5^7p(/i), Up{t) = Zr/2J^l/T. 



(3.11) 



h=l 



Lemma 3.3. Suppose L = o{\/Na'j). Under Assumpt 



ions 



3.3 



3.9 



^ h>L 



and 



Upir) = 



\ogN 



T 



The following theorem shows that Up{t) is an H-CLUB for w'(Sp — S)w, and can be 
computed easily from the data. Interestingly, w'Spw and w'Sjw have the same asymptotic 
limiting distribution. The price paid for not knowing the factors is asymptotically negligible. 



Theorem 3.3. Suppose the common factors are unobservable, and Sp (3.8) is used as the 



covariance estimator. Under the assumptions of Lemma 3^, asT,N ^ oo, 

-1/2 



var 



J](w'Bf, 



.t=i 



Tw'(Sp-S)w A/'(0, 1), 



and for any r > 0, 



P ( |w'(Sp - S)w| < f/p(r) ) ^ 1 - r, 



Remark 3.2. Similarly, if we define Rp{w) = y w'Spw, then Up{t) / y Aw'Hipw is a valid 
H-CLUB for \Rp{w) - i?(w)|. 

Knowing the factor-structure of the return R^ improves the estimation efficiency relative 
to the sample covariance estimator. This is demonstrated by the following theorem. 
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Theorem 3.4. Under the assumptions of Theorem 



3.3, 



var 



> var 



$](w'Bf, 



t=i 



t=i 



The difference of the above two variances is actuaUy small when w is diversified enough, 
and this fact is further verified by our simulation results (see Tables |3] and |4] in Section |4]). The 
reason is that the systematic risk cannot be diversified. On the other hand, factor analysis 
gives a strictly positive definite covariance estimator, whereas the sample covariance may 
produce a risk estimator being zero for certain portfolio allocation vectors. The positive 
definiteness is particularly important to estimate the optimal portfolio allocation vector. 
Furthermore, factor analysis interprets the structure of portfolio's risks. It is clearly seen 
that the idiosyncratic risks can be diversified away by the portfolio allocation. 



4 Monte Carlo Examples 

In this section, we examine the finite-sample performance of both the full confidence 



upper bound defined in (3.1) and H-CLUB, based on three covariance estimators E, 
using portfolios w with different gross exposure constraints. Graphical and numerical results 
illustrate that C,t is indeed a very crude bound and H-CLUB has much better performance 
in general. The number of factors and length of time are both fixed with K = 3 and T = 300 
respectively. The dimensionality gradually increases from 20 to 600. 

Excess returns of the ith stock of a portfolio over the risk-free interest rate, yn, is assumed 
to follow the Fama- French three-factor model [Fama and French(1992)]: 

Hit = \lflt + -^42/2* + \3f3t + Uit- 



The first factor is the excess return of the whole equity market, while the second and third 
factors are SMB ("small minus big" cap) and HML ("high minus low" book/price) respec- 
tively. Using US equity market data, we calibrate a submodel to generate the loadings 
bj = (Ail, K2, KsY, the idiosyncratic noises Uj and the factors ft = {fit, f2t, fzt)'- 



4.1 Calibration 

To calibrate parameters in the model, we use the data on daily returns of S&P 500's top 
100 constituents ranked by market capitalization (on June 29*'* 2012), the data on 3-month 
Treasury bill rates, and daily return data of the Fama-French factors. They are obtained 
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from COMPUSTAT database, the data library of Kenneth French's website, and CRSP 
database respectively. The excess returns (yt, f^) are analyzed for the period from July P*, 
2008 to June 29*'' 2012, approximately 1000 trading days. 

(1) Calculate the least square estimator B of yt = Bf^+Uj, and compute the sample mean 
vector fj,^ and sample covariance matrix of all the row vectors of B. These parameters are 
reported in Table [l} The factor loadings {bj}^^ of the simulated models are then generated 
from a trivariate Gaussian distribution Af^d-i^, S_b). 



Mb 




0.9833 


0.0921 


-0.0178 


0.0436 


-0.1233 


-0.0178 


0.0862 


-0.0211 


0.0839 


0.0436 


-0.0211 


0.7624 



Table 1: Mean and covariance used to generate bj 

(2) Assume that the factors follow the stationary vector autoregressive VAR(l) model 
ft = l-i' + $ft-i + St for some 3x3 matrix where £t follows i.i.d A3(0, SJ. The model 
parameters and are calibrated using the daily excess returns of the Fama- French 
factors ft- The covariance matrix cov(fj) is then obtained by solving the linear equation 
cov(ff) = $cov(ft)$' + Se- Results are summarized in Table [2} 







cov(fj) 


0.0260 
0.0211 
-0.0043 


-0.1006 
-0.0191 
0.0116 


0.2803 
-0.0944 
-0.0272 


-0.0365 
0.0186 
0.0272 


3.2351 
0.1783 
0.7783 


0.1783 
0.5069 
0.0102 


0.7783 
0.0102 
0.6586 



Table 2: Parameters used to generate ft 



(3) The error covariance matrix is sparse in our setting. For each fixed A^, it is created 
by S„ = DSqD, where D = diag(o"i, ■ ■ ■ , ap). To be more specific, ai, ■ ■ ■ , CTp are generated 
independently from a Gamma distribution G{a, in which a and /3 are selected to match 
the sample mean and sample standard deviation of the 100 standard deviations of the errors 
= y* — Bft (recall that each is 100 dimensional). An additional restriction is imposed 
on (Tj that only values in between the minimum and maximum of the standard deviation 
of Ut are accepted. We then generate the off-diagonal entries of the correlation matrix Sq 
independently from a Gaussian distribution, with mean and standard deviation equal to 
those of the sample correlations of the estimated residuals. Moreover, absolute values of the 
off-diagonal entries are set to no greater than 0.95. Finally the hard-thresholding is applied 
to make Sq sparse, where the threshold is set to be the smallest constant that makes Sg 
positive definite. 
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4.2 Representative portfolios 

Wc examine the performance of H-CLUB based on w with a couple of different gross 
exposures. For a given exposure c and given number of assets N, there are infinitely many 
portfolios w that satisfy Xli^i ™« — 1 Xlili l"^*! — ^- order to be representative, we 
take some portfolios randomly from this set. This task, which generates uniformly for the 
above set in R'^ , is of independent interest for portfolio optimization and research. It is also 
challenging. 

Let be the total long position and be the total short position. Then, = 
(c + l)/2 and = (c — l)/2. For c = 1, there are no-short positions. For c > 1, there 
are both long and short positions. The identities (or indices) of long and short positions 
are hard to identify, but the following sampling scheme is a reasonable approximation: The 
positive positions are determined by a Bernoulli trial (N times) with probability of success 

/ {w~^ + w^) = (c + l)/(2c). Once the identities are determined, we can normalize them 
and the problem reduces to the case with c = 1. For the case with c = 1, the uniform 
distribution on the set {wi : = > 0} can be generated from a normalized 

exponential distribution: 

N 

Wi = Ci/ 0) Ci '^i.i.d. standard exponential. 

i=l 

Combination of the the above two steps, we can generate a randomly selected portfolio 
from its feasible set as follows. 

1. Generate a positive integer k, the number of stocks with positive weights in w, from a 
binomial distribution Bin(A'^, ^). 

2. Let w+ = {Wl, • • • , w'l) be a temporary vector of the positive weights in w. Generate 
independently {Ci}f=i from the standard exponential distribution and set each w'l — 
(c+l)G/(2E,tiO)- 

3. The temporary negative weights in w_ = {u\ . ■ ■ ■ ,w^_i^) are generated analogously 
with each = (1 — c)(i/2J2f=iCjj where {Cjjj^'' are obtained independently from 
the standard exponential distribution. 

4. Take the portfolio weights w as a random permutation of the vector (w+, w_). 
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4.3 Simulation 



For each simulation with a given c, we fix T = 300 and gradually increase N from 20 to 
600 in a multiple of 20. For each fixed N, we use 50 different model parameters and 200 
testing portfolios for each given set of model parameters so that a total of 10,000 portfolios 
were actually used. In other words, we iterate the following steps for 50 times, record values 
of i?(w),A = |w'(S — S)w|,^T and U{t), and compute their own means and standard 
deviations. The details are summarized as follows: 

1. Generate {bj}^^ independently from Ms^fj,^, S^). Set B = (bi, ■ ■ ■ ,bp)'. 

2. Generate {ut}f^^ independently from A'p(0, S^). 

3. Generate {ft}J=i from a VAR(l) model = + #ff_i + St with parameters specified 
in the calibration part. 

4. Calculate yt = Bf^ + for t = 1, ■ ■ ■ , T. 

5. Calculate the sample covariance matrix S = YlJ^^iyt — y)(yt — y)'; obtain the 
factor-based covariance estimator Sj by using the hard-thresholding rule with the 
threshold ut = O.lOK^log N/T; and get the POET covariance estimator Sp using 

;N , J_\ 
/TV''- 



the soft-thresholding with thresholding parameter 0.5^y^li4^J^{^J — h 



6. Generate 200 w according to the method described in Section 4.2 



7. Over the 200 generated portfolios w, compute the average of true risk R{w) = Vw'Sw; 
Also for S = S, and Sp, compute their respective average of A = |w'(S — S)w|, 
= ||w||i||S — S||niax and U{0.05) = l^a'^jT. In our setting, the number of lags 
L = 5. 

Under several gross exposure constraints c, we produce the graph of risk domain by 
plotting i?(w) as a function of c and (20 to 600 in increments of 20). Averages of A, 
and ?7(0.05) are also plotted against A^, for all three types of covariance estimators S. We 
will observe from the graphs that portfolios with larger c are exposed to have higher risks. 

Finally, we fix the dimensionality A^ = 600 and the number of simulation replications is 
now set to 500. Values of two ratios are recorded, namely ratio of bounds 



iT _ ||W||2||S-S| 



[/(0.05) 



max 
) 



2 A / var( w'Sw) 
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and relative error 

f/(0.05) A/varfw'Swl 



REo 



4w'Sw 2w'Sw 

This is computed for c in a practical range of [1, 2] and for several lengths of the time series. 
The means and standard deviations of the two ratios are summarized in tables below. 



4.4 Results 

In Figure [l| averages of annualized true risks R{w) of 10000 portfolios 50 sets of model 
parameters are plotted against dimensionality A^. Multiple curves with different settings on 
c are produced for comparison purpose. As shown in the figure, average of the actual risk 
ranges from less than 30% to around 50% per annum, as c varies from 1 to 4 and gradually 
grows from 20 to 600. 

Figure 1: Averages of annualized risks R{w) with ||w||i = 1, 2, 3 and 4, over 10000 portfolios. 



The Risk Domain 




100 200 300 400 500 

N 



The following two observations can be made from Figure [T] 

(1) The average risk is higher for a larger exposure parameter c. This is consistent with 
the fact that portfolios with greater gross exposure are more volatile, and hence incur 
higher risk. 

(2) Given a gross exposure level c, as the portfolio size A^ increases, the average risk 
decreases. The rate of decline is very fast until A^ is around 150. This is consistent 
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with the theory that as increases, the portfoho becomes more diversified and the 
idiosyncratic risk is reduced through diversification. 



Figure 2: Averages of A = |w'(S — S)w| (blue curve), [/(0.05) = 2Yvar(w'Sw) (dashed 
curve) and = ||w||^||S — S||max (red curve) for c = 1 and 1.6 over 10,000 portifohos. 




(a) c = 1 (b) c = 1.6 



In Figures |2] and [3| the average risk estimation errors are plotted along with their es- 
timated error bounds for different exposure parameters c=l, 1.6, 2 and 3, using different 
estimators S = S,Sj and Sp. In particular, c = 1.6 results in 130% long positions and 
30% short positions (130/30 strategy). The 130/30 structure is popular in long-short funds. 
In each of the small figure, the dashed curve corresponds to f/(0.05), the solid red curve 
corresponds to and the solid blue curve corresponds to A. Based on these plots, we can 
observe the following features. 

(1) The dashed curves lie entirely above the solid blue one, refiecting the validity of the 
95%-error bound of f/(0.05). 

(2) The full confidence upper bound is indeed a very crude bound and is much larger 
than [/(0.05). The larger c is, the worse the difference, which will be further detailed 
in Table 1 

(3) H-CLUB (dashed curve) slightly increases with larger A^, but its degree of increases is 
much smaller than the crude bound C,t- 
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(c) c = 2 (d) c = 3 

Figure 3: Same as in Figure [2| with c= 2 and 3. 

Means and standard deviations (in parentheses) of REi = ^^/[/(O.OS) are summarized in 
Table [3| which quantifies the relation between the full confidence bound and the H-CLUB. 
Numerical results justify our observations in Figures [2] and |3] in the sense that C,t is in general 
many times greater than U{t). Moreover, the ratio ^t/U{t) increases dramatically as the 
gross exposure ||w||i increases. 



Table 3: Averages and standard deviations (in parenthesis) of REi over 500 iterations using 

three diffe rent estimators. 

c=l c=1.2 c=1.4 c = 1.6 c=1.8 c = 2 



REi 
S 

REi 



REi 



5.1280 7.4632 10.4257 12.7665 16.7107 20.7675 

(2.1303) (3.2754) (4.3942) (5.5150) (7.1332) (9.0050) 

5.1294 7.4764 10.4544 12.7822 16.8100 20.9012 

(2.1630) (3.3155) (4.4402) (5.5731) (7.2942) (9.2622) 



5.0910 7.3989 10.3536 12.6485 16.6076 20.6935 
(2.1672) (3.3350) (4.5094) (5.5913) (7.3239) (9.3091) 



Averages and standard deviations of relative error RE2 = y vij(w'Sw)/2w'I]w with 
two choices of the length of time series T are summarized in Table |4]j5] RE2 measures 
the accuracy of the perceived risk R{w)^ with respect to the actual risk R{w)^, indeed 
RE2 ~ ASD(i?(w)5)/i?(w)i. From the tables, it is not difficult to observe that standard 
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deviations are small when compared to their corresponding means. The results also show that 
the relative error are negligible, at around 3% ~ 5%, ensuring the estimate of -R(w) a high 
level of accuracy. More interestingly, we realize that this ratio is approximately independent 
of the gross exposure c but sensitive to the length of the time series. RE2 steadily decreases 
as T grows. In addition, the asymptotic standard deviation of the factor-based estimators 
(Sj and Sp) are slightly smaller than that of the sample covariance based risk estimator. 



Table 4: Averages and standard deviations of RE2 over 500 iterations, with T = 200. 





c= 1 


c = 1.2 


c = 1.4 


c= 1.6 


c= 1.8 


c = 2 


RE2 

s 


4.8381% 
(1.0015%) 


4.8076% 
(1.0147%) 


4.6563% 
(0.9368%) 


4.7499% 
(0.9989%) 


4.7267% 
(0.9894%) 


4.7723% 
(0.9648%) 


RE2 


4.8264% 
(0.9971%) 


4.8038% 
(1.0110%) 


4.6409% 
(0.9385%) 


4.7316% 
(0.9933%) 


4.7150% 
(0.9835%) 


4.7458% 
(0.9646%) 


RE2 

Sp 


4.8305% 
(0.9993%) 


4.8055% 
(1.0104%) 


4.6443% 
(0.9368%) 


4.7350% 
(0.9922%) 


4.7133% 
(0.9859%) 


4.7478% 
(0.9624%) 



Table 5: Averages and standard deviations of RE2 over 500 iterations, with T = 400. 





c= 1 


c = 1.2 


c = 1.4 


c= 1.6 


c= 1.8 


c = 2 


RE2 
s 


3.4773% 
(0.5081%) 


3.4840% 
(0.4936%) 


3.4836% 
(0.4783%) 


3.4857% 
(0.5303%) 


3.5176% 
(0.5117%) 


3.4846% 
(0.5602%) 


RE2 

S/ 


3.4693% 
(0.5081%) 


3.4759% 
(0.4976%) 


3.4699% 
(0.4775%) 


3.4708% 
(0.5217%) 


3.5009% 
(0.5127%) 


3.4588% 
(0.5621%) 


RE2 
Sp 


3.4744% 
(0.5092%) 


3.4773% 
(0.4964%) 


3.4737% 
(0.4783%) 


3.4737% 
(0.5229%) 


3.5029% 
(0.5132%) 


3.4619% 
(0.5629%) 



Finally, we also observe from Tables [sjjsj that the asymptotic variances (reflected by U (r)) 
of the estimators based on known and unknown factors are almost the same, and slightly 
smaller than that of the sample covariance estimator. 



5 Empirical Studies 

We assess the performance of H-CLUB in a portfolio allocation. We use the daily excess 
returns of 100 industrial portfolios formed on the size and book to market ratio from the 
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website of Kenneth French. The study period is from July 1"* 2008 to June 29*'^ 2012 
(T = 1000). At the end of each month the covariance matrix is estimated by three estimators, 
the sample covariance, the factor-based estimator, and the POET estimator, using daily 
returns of the preceding 12 months (T = 252). In particular, we employ the Fama- French 
three-factor model to construct the factor-based estimator. Two types of strategies are 
tested, namely the equally weighted portfolio, and the minimum variance portfolio. The 
optimal portfolios are constructed under different exposure constraints (c = 1 and c = 1.6). 
The equally weighted portfolio is given by w = (1/A^, ■ ■ ■ , 1/iV) • The minimum variance 
portfolio is given by 

w = argminw'i=i,||w||i=cw'Sw. 

Portfolios are held for one month (T = 21) and rebalanced at the beginning of the next 
month. Their actual risks in the holding month for w defined above are 

i?(w)=(w'Sw)'/\ S = l|^y,y;. 

t=i 

This is aggregated over the entirely testing period. 



Table 6: True risk errors and estimated risk errors based on the 100 Fama- French Industrial 
Portfolios. 







Average of 


Average of 


Average of 


True 


Estimated 


Strategy 




A(xlO-^) 


f/(0.01)(xlO-^) 


True Risk 


Risk Error 


Risk Error 






Sample 


Covariance Matrix Estimator 






Equal weighted 




2.356 


2.757 


20.81% 


11.18% 


11.37% 


Min variance (c = 


1) 


1.006 


1.232 


14.38% 


7.00% 


7.45% 


Min variance (c = 


1.6) 


0.497 


0.622 


11.58% 


4.69% 


5.18% 


Factor-Based Covariance Matrix Estimator 


Equal weighted 




2.352 


2.693 


20.81% 


11.16% 


11.22% 


Min variance (c = 


1) 


0.999 


1.234 


14.45% 


6.95% 


7.48% 


Min variance (c = 


1.6) 


0.475 


0.607 


11.79% 


4.52% 


5.07% 


POET Estimator 


Equal weighted 




2.353 


2.757 


20.81% 


11.17% 


11.38% 


Min variance (c = 


1) 


1.005 


1.171 


14.38% 


6.99% 


7.07% 


Min variance (c = 


1.6) 


0.490 


0.572 


11.59% 


4.61% 


4.64% 



Here A = |w'(S - S)w| and [7(0. 01) = 2.58(virr(w'Sw))i/2_ jy^g ^^^^ R{w). The True Risk 
Error and Estimated Risk Error are |(w'Sw)i/2 - (w'Sw)i/2| and C/(0.01)/\/4w'£w respectively. 



For each covariance matrix estimator and strategy, we study five quantities, whose re- 
spective averages over the whole study period are summarized in Table [6] In particular. 
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the estimated risk error f/(0.01)/V 4w'Sw is the H-CLUB for the true risk error. See, for 
example, CoroUahy 3A Here the risks are annuahzed. By comparing the first two columns 
in Table [oj we observe that [/(O.Ol) is uniformly greater than A, regardless of the strategies 
and the covariance matrix estimators. This is in line with the expectation that ?7(0.01) is a 
99% upper bound of the estimation error of portfolio variances. Moreover, as shown in the 
two rightmost columns, results are satisfactory in the sense that the estimated risk errors 
are close (< 1% per annum) to the true risk error. 



6 Conclusions 

In this paper we address the estimation and assessment for the risk of a large portfolio. 
The risk is estimated by a substitution of a good estimator of the volatility matrix. We 
propose factor-based risk estimators, based on the approximate factor model with known 
factors and unknown factors. For the first time in the literature, we derive the limiting 
distribution of the estimated risks under high dimensionality. 

Given that the existing upper bound for the risk estimation error is too crude and not 
applicable in practice, we introduce a new method, H-CLUB, to assess the accuracy of the 
risk estimation based on the confidence intervals. Our numerical results demonstrate that the 
proposed upper bounds significantly outperform the traditional crude bounds, and provide 
insightful assessment of the estimation of the true portfolio risks. 

It is demonstrated in the empirical study that the financial excess returns may not be glob- 
ally stationary. Our method also allows for locally stationary time series and can also allow 
slow-time varying covariance matrices through localization in time (time-domain smoothing). 

A Proofs for the Sample Covariance 

Define Zx^t = w'R^R^w — i^w'R^R'^w, where Z depends on T through dim(Rf) = N = 
Nj- and allocation vector w. Then 7t(^) = EZT^tZx^t+h- In particular, 7t(0) = var(Zr_(). 

A.l Proof of Lemma [STT 

Lemma A.l. (i) |(w'Sw)2 - (w'Ew)2| = Op{T-^'^aT). 

(ii) max,<^ |T-i Ef=?(w'Rt)'(w'R,+,)2 - E(w'R,)2(w'R,+,)2| = 0,{^LJT). 
(m) maXh<L |w'Sw - T'^ Ej=ii^'^t f\ = Op{L^ w'l^w /T) . 
(tv) max,<z. Iw'Sw - T'^ Y.ti i^'^t+h?\ = ^^(L^w'Sw/r). 
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Proof. Note that for any x matrix A = (0^), |w'Aw| < || A||max||w||p Thus |(w'Sw)^ — 
(w'Sw)2| = Op(|w'(S - S)w|) = Op{\T-^YlJ^-^ZT,t\). The Chebyshev inequahty imphes 
\T-'j:liZTA=0,{T-y'y^). 

(ii) Let Xf^h = (w'Rt)^(w'Rj_,_/i)^. By the Chebyshev inequahty, for any s > 0, 

Pira..\^f:^,>^-^^t,.\ >s)< Lra..Pi\^J2^,,-EX,,\ > s) < ^^lax,,, var^L ^u) _ 

t=i ~ t=i 

Note that maXh<L^^T^{Ylt=i -^t,h) = 0(T) since max/i<i var(X(^ft) = 0(1) and 
max;i<2, cov(Xi /i, Xt+i /i) = 0(1). Therefore, for arbitrarily small e > 0, by 
choosing s > a/ LM/ (eT), P{maxh<L\^J2t=iXt^h — EXt^h\ > s) < e, which implies 
ma.Xh<L\jiYlt=i-^t,h — EXf^hl = Op{y/L/T). The conclusion then follows from the ad- 
justment of the L terms in the summation. 

(iii) The left hand side is niax/i<L T~-^ ^^^_^_(_-,^(w'Rt)^ = maxT-L+i<i<r(w'R()^L/T. 
For any s > 0, P(maxT-L+i<t<r(w'Rt)^ > s) < LP((w'Ri)^ > s) < Lw'Sw/s, which then 
implies maxT-L+i<t<T(y^'^Y = Op(Lw'Sw). The desired result then follows. 

(iv) A similar argument as above shows maxr+i<t<T+L(w'Rt)^ = Op(Lw'Sw). Hence 



maxh<LT-'EUT-h+ii'^''^t+hy < maxT+i<t<T+L(w'Rt)2L/r = Op(L2w'Sw/r). This i 
plies that the desired quantity is bounded by a + Op(L^w'Sw/T) where 



im- 



a = max 



i5^[(wR,)^-(w'R,+,)2]| < |iX](w'R,)2| + |iX^(w'R^+,) 



L ^ L 

2| 



Note that |^ Ef=i(w'Rt)^| < maxi<t<i(w'Rt)2L/T = Op{L'^w'i:w /T). Similarly we have 
I? Ef=i(w'RT+i)'| = Op(L2w'Sw/T). □ 

Lemma A. 2. max/,<i \'^{h) - iT{.h) \ = Opiy^L/T). 

Proof. The triangular inequality implies max/i<i |7(/i) — 7t(^)| < J2t=i where 



ai = max \T-' V(w'Ri)2(wU+h)' - i?(w'R,)'(w'Ri+,)2|, = \{w'Sw)' - (w'Sw)2| 

/i<L ' 

03 = w'Swmax |w'Sw — T^^ ^^(w'Rt)^|, 04 = w'Swmax |w'Sw — T^'^ ^^(w'Rf+/i)^|. 

t=i ~ t=i 

We have, w'Sw < |w'(S - S)w| + w'Sw = Op(w'Sw + T-^/Vf.). It then follows from 
and 4 = 0(1), = 0{T), w'Sw = 0(1) that = Op{^/L/T) for i = 1...4, 



Lemma 



A.l 



which implies ma.Xh<L \l{h) — 7t(^)| = Op{^/L/T). □ 
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Proof of Lemma 13.11 

By the triangular inequality, — ct^I < Y2^=i where 



h = |7(0) - 7t 



h=l 



h>L 



Here 62 < ^LmaxhKL \l{h) — 7T(/i)| = Op{L\J L/T). The convergence rate then follows from 
The second part Us{t) = o{^y\ogN/T) is due to = Op(cr|.) as |cr|. — = 



Lemma 



A.2 



Op{a^) and af, = 0(1) = o(logA^), as — )• 00. 



A.2 Proof of Theorem [3TT 

Lemma A. 3. (^zj EZ^-^ = 0(1) anc? max/<T |7r( 



0(1). 



^z^; For any K G [m,T], var(E£i ^T,t) = if7T(0) + Ef=i(l - h/Khrih) = 0{K). 

Proof, (i) It suffices to show E{WR,tY = 0(1). In fact by maXi<N ERf^ = 0(1), 

ijM=i'^i'^j'^kWiERitRjtRktRit < inaxj<7v -E-R|||w||^ — 0(1). The second part 
follows immediately. 

(ii) It is well known that for a stationary process with zero mean, var(A'~^ Ylf=i ^T,t) = 
K-^jt{0) + 2K-^ Ef=i(l - h/K)-fT{h), which implies the resuh. 

□ 



Lemma A. 4. Under the assumptions of Theorem \3.1 

. -1/2 



var 



5^(w'R, 



rw'(s-s)w A/'(o,i). 



(A.l) 



Proof. The proof is based on Theorem 2.1 of Peligrad (1996). We have vTw'(S — S)w = 
^ j^gg^g ^2^^ _ var(^,^^ Zt,*) and 5^ = var(^f^^ Zr^t) = 0{T). Also let 
c"t = 7t(0) + 2 XlhLi Irih). By Davydov's inequality (Proposition 2.5 of Fan and Yao, 2003 
with p = 1/2 and g = 1/4), there are constants M, Mi, M2 > such that for any integer h, 

Mm < 8anmY^\E{w'R,YY/\E{w'R,YY/' = M,exp{-M\hr /4) 

where the last equality follows from the a-mixing condition and that i?(w'Rj)^ = 
0(1). By the assumption that afi{T) = 0(7^(0)), the correlation \CoTY{ZT,t, ZT,t+T)\ < 
|7t(T)|/7t(0) = 0(1). Moreover, the Lindeberg condition holds given ma.Xi<^ ERf^ < 
00. Hence the conditions of Theorem 2.1 of Peligrad (1996) are satisfied, which implies 
Bt^ ELi ZT,t ^'^ A/'(0, 1), equivalent to (O). 



29 



Proofs of Theorem |3.1 and Corollary |3.1 

Now let ^(T) = —2Y^^=ih'yT{h)/T. By the assumption that ^{T) 
r-i/2(4)-i/2 ^J^^ ZT,t ->'^ Af{0, 1). This also implies 



w'(S-S)w A^(0,1). 




o{a^), we have 



(A.2) 



Due to the assumptions that L^/'^T = o{a'^) and '^hyh'^Tih) = o(a^), and Lemma 



3.1 



we have \aj. — a \ = Op(o"|.). Since w'(S — S)w = Op{T ' \/ a, 



h>L 
1/2, 



4), 



w 



It then follows from (A.2) that ^T/a'^W{S - S)w ^'^ J\f{Q, 1), which gives the H-CLUB. 



Corollary 3.1 follows straightforward from applying the delta method. 



B Proofs for the Factor-based Estimation 



□ 



B.l Proof of Lemma 13^2 



Lemma B.l. max;,<i \lf{h) - ^f{h)\ = Op{^{L + log N)/T). 

Proof. The triangular inequality implies maXh^L \lf{h) — 7/(^)1 < J2t=i where 



T-h 



as 



max \T-^ V(w'Bft+fc)2(w'Bfi)2 - E{w'Bftf{w'Bft 

h<L ^ 

t=l 

|(w'BcOT(ff)B'w)2 - (w'Bcov(ft)B'w)2| 

T-h 

w'BcOT(fj)B'wmax |w'BcoV(ff)B'w - T'^ ^(w'BfJ 



t+h)'^\, 



t=l 
T-h 



04 = w'Bcov(ft)B'w max |w'Bcov(f()B'w — T ^ y^(w'Bff_|_/; 

h<L 



a\ is bounded by an + ai2, where 

an = max,<z. \T-^ Ef=?(w'Bfi+,)'(w'Bf,)2 - E(w'Bf,)2(w'Bfi+^)2|, and 
ai2 = max,<z. \T-^ Ef=?(w'Bfi+,)'(w'Bf02 - (w'Bfi+,)2(w'Bf,)2|. 

Given the assumption that max/i<L X]t=i cov[(w'Bfi)^(w'Bfi+/j)^, (w'Bfi+t)^(w'Bf: 



0(1), the same argument of the proof of Lemma A.l ii) implies an = Op{\jL/T). On the 
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other hand, by (B.14) of Fan et ah (2011), ||B - B\\^^^ = Op{^\ogN/T), which imphes 
||w'(B -B)|| = Op{^J\ogN/T). It is then easy to show that ai2 = Op{^\ogN/T). It follows 
that ai = Op{\J {L + log N)/T). By the triangular inequality, 02 = Op{^y\ogN/T). Finally, 
by the same argument of the proof of Lemma A.l , we have 03 = Op(L^/T) = 04. 



Proof of Lemma 13.21 



We have laj 



=1 bi, 



2E?,=il7/(^) - lf{h)\- Lemma 



□ 



where bi = |7/(0) - 7/(0)|,&3 = '^T.h>LlfW^ and 
implies 62 < 2Lmaxh<L\lfih) - 7/(/i)| = 



B.l 



Op{L^{L + log N)/T), which gives the convergence rate. The second statement is due 



to ai 



Op{\ogN). 



B.2 Proof of Theorem 3.2 



Write R = (Ri, ...,Rt) be x T; F = (fi, ...Jt) be r x T, and COT(ft) = FF'/T. We 
have B = RF'(FF')"^ Define Ct = B - B and Dt = cov(ft) - cov(fj). The we have the 
following decomposition: w'(S/ — S)w = ^^^i dt, where 

di = w'BDtB'w; d2 = 2w'CTCOv(ft)B'w; 
4 = w'CTCOv{ft)C'rpW, di = w'(S„ - S„)w. 

We now study each of the above four terms separately. Let E = (ui, ut) he N xT. Then 
Ct = EF'(FF')-^ 

Lemma B.2. (i) ||FE'w|| = Oj,(Ti/2(w'S„w)i/^(E|w'ui|4)i/8). 

(ZZ) \d2\ = Op(T-V2(w'S„w)l/4(i^|w'u,r)V8)). 

Proof. We have, 

E||FEwf = E[tr(w'EF'FE'w)] = tr[^(FE'ww'EF')] = tr[E(FE(Eww'E|F)F')] 
= tr[E(FE(E'ww'E)F')]. 



For the inner expectation, i?(E'ww'E) = {E[u[ww'us])t<t,s<T = {cov(w'ut,w'us))t<t,s<T- 
By Davydov's inequality, (see, e.g.. Proposition 2.5 of Fan and Yao, 2003 with p = 1/2 and 
q = 1/4), |cov(w'ut,w'u,)| < 8a/(|t- s|)l/^(w'S„w)V2(^|w'u^^l/^ where a(-) denotes the 
a-mixing coefficient. By ^2^=1 '^fi'^Y^^ < have 

■r T T 

E||FE'wf = J2J2J2coY{w'ut,w'us)Eifktfks) 

k=l t=l s=l 
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T T 

= 0(l)(w'S„w)i/2(i5;|w'uini/^ J] - siy/^ = 0{T{w'i:^wY/\E\w'u,\y/'] 

t=l s=l 

which then imphes (i). For part (ii), we have 

1^2! = 2|w'BcOT(ft)(FF')"^FE'w| = |^|w'BFE'w| < |;||w'B|| ||FE'w||. 

Now write B = {Xij)i<N,j<r, then ||w'B|p = Zlj=i(ZliIi ^i-^u)^ < maXij|Ajj |r||w||^ = 0(1). 

□ 

Lemma B.3. For the factor-based thresholded error covariance matrix, 

l/2-g/2^ 



~ Sull — Op Stv 



logiV 
T 



Proof. By Lemma 3.1 in Fan et ah (2011), we have, maxj<ArT ^Y^=i{'^it ~ '^itY = 
Op{\ogN /T). The result then follows from Theorem A.l in Fan et al. (2013). □ 

Lemma B.4. (i) |4| =Op{T-\WT.u^fl^{E\Wvit\^Yl^). 
(it) 141 = Op(s;v(logiV/T)i/2-<?/2vs„w) 

Proof, (i) Because ||(FF')"i|| = Op(T-^), 

141 = T-iw'EF'(FF')"iFE'w = Op(T-2 ||FE'wf). It then follows from Lemma |R2 
(ii) it follows from 141 < II^m^^mII < Aj^?"j^(Su)||I]m — I]„||w'I]„w and Lemma 

Lemma B.5. Yl^=i |7/(^)I < 



B.3 



□ 



Proof. By Davydov's inequality (Proposition 2.5 of Fan and Yao, 2003 with p = 1/2 and 
q = 1/4), there are constants Mi, M2 > such that for any integer h, 

\lf{\h\)\ < 8af{\h\y/\E{w'Bftyy/'{E{w'Bft)Y^ = M2exp{-M\hp /A) 

where the last equality follows from the a-mixing condition as well as the fact that 
£'(w'Bfi)'' = 0(1) due to ||w||i = 0(1). The result the follows from exp(— O/i''-^) < 00 

for any C, > 0. □ 



Lemma B.6. JT / a^rdi ^'^ Af (0,1) . 



Proof. Let Z^^t = w'B(fffj' — £'f(f/)B'w, which depends on T through dim(w) = N^. 
Hence 4 = T.J^^ ZT,t. Note that Hw'Bf < r ||B||^^^||w||2 = 0(1). Hence 
EZ\^ = 0(1). Similar to the proof of Theorem 3.1, we define = vai{J2t=i ^T,t) 
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and = 'vajL{Ylt=i ^T,t) = 0{T). By the assumption that QfiT) = 0(7/(0)), the correla- 
tion |Corr(Zr,t, ^T.t+r)! < 17/(^)1/7/(0) = o(l). Moreover, the Lindeberg condition holds 
given the exponential tail of f*. Hence the conditions of Theorem 2.1 of Peligrad (1996) are 
satisfied, which implies 



B^'J2^T,t ^"A/'(0,1). 



(B.l) 



t=i 



For 7/(/i) = cov((w'Bft)^, (w'Bft+/,)^), we have 7/(/i) = cov {Zr^t, ZT,t+h)- Now 
S|, = T7/(0) +2TELi7/(/^) -2TELi^7/(/^)/T. Because ELih^W/T = 0(7/(0) + 
2Er=i7/(/^)), we have T-V2(^2)-i/2 ^ ^^^^ ^^^^^^^ ^^ere = 7^0) + 

2Sft^i7/(^)- The result then follows. □ 

Lemma B.7. 




w'(S/ - S)w Ar(0,l). 



(B.2) 



Proof. In fact. 




w'(Sj — S)w 





By Lemma 



B.6 



B.2 



it suffices to show that ^T/a'j{d2 + d^ + d^) = Op(l). By Lemma 
T/^f\d2\ ^p((w'S„w)i/4(i^|w'u£)i/8)/y/;;2) ^ 0,((w'S„w)V4/^) ^ ^^(1) — ; 

£'|w'ui|^ = 0(1). Moreover, Lemma 



B.4 



implies jT/aj\ds\ = 0,((w'S„w)V2(Taf )-V2) 

that 



B.4 



Op(l) since w'S„w = o{crj) = 0(1). It also follows from Lemma 
Op(w'Ews^((T2)-V2(iog]v)i/2-g/22-g/2) ^ ^Yiis implies the desired result 



T/aj\di\ 



□ 



Proof of Theorem 

The first statement 



3.2 



3:2] 



var E;=i(w'Bf,) 



-1/2 



Tw'(S/ - S)w -^^ Ar(0, 1) follows from 



B.l). By the assumptions. La/ {L + log N)/T = o{(Tj) and X]/i>l7/(^) ~ ^(^/) and Lemma 
imply \aj — = Op{aj). Since w'(Sj — S)w = Op{T^^/'^ Jaj), we have 



Vriw'fSf - s)w| 



Hence Lemma 



B.7 



gives JT/ajw'{'Ef - S)w iV(0, 1), which vahdates the H-CLUB. 
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C Proofs for the POET-based Estimation 



Let V denote the rxr diagonal matrix of the first r largest eigenvalues of S in decreasing 
order. Let F = (fi , . . . , fj^) be an r x T matrix such that the rows of F/ y/T are the eigenvectors 
corresponding to the r largest eigenvalues of the T xT matrix R'R. Let B = RF'/T. Define 
an r X r matrix 

H = -V^FF'B'B. 

T 

Then B and it can be treated as estimators of BH^^ and Hf^ respectively. 



C.l Proof of Lemma 13^3] 



Lemma C.l. (i) ||w'B|| = Op(l), and ||w'(B - BH-^)\\ = Op(A^-i/2 + (logN/Ty/'^) 
(tt) ||F - HFIIVT = T-i Ef=i \% - HfilP = 0,{N-^ + T'^). 



(m) llw'Ef = Op(T). 



M \\T-' T.i=Am - Hf*(Hf,)']|| = 0,(iV-i/2 + T-1). 

Proof, (i) By Lemma B.16 in an earlier version of Fan et al. (2013)[^ ||B||max ^ II B — 
BH-i||max + ||B|Uax = Op(l). Thus llw'Bf < r||B||^^^||w||2 = Op(l). On the other hand, 
||w'(B - BH-i)||2 < r||B - B||Lxl|w||? = Op(l/iV + log iV/T). 
(ii) By (A.l) in Bai (2003), the following identity holds: 

(T T T T \ 

i Y^sE{v^s^t)/N + i Y^Xst + ^ Yjsllst + ^ Y^sU (C.l) 
s=l s=l s=l s=l / 

where Cst = K^t/N - E{u',nt)/N, r/,* = f'^Eti^^^^t/N , and U = f* ELi b^w.s/iV. It 
follows from Lemma C.7 in Fan et al. (2013) that 

1^1^^ 1^1^^ 1^1^^ 1 

^ ( y ^ ^ /tsCf) ~^ 7^ ^ ] fisVst) ~\~ 7^ y^ (— ^ ] fis^st) = Op{ — ). 

t=l s=l t=l s=l t=l s=l 

Moreover, by Lemma C.9 of Fan et al. (2013), maxi<^ ^ ELiS - HfJ^ = Op{l/T + 1/A^). 
Applying the inequality (a + 6)^ < 2a^ + 26^ gives, 

T T T T 

\ E 4^Ku,)/iV)2 < 1 5^(1 ^[|(f. - Hf.),| + \{mM]\E{y^s^t)\/Nf 

t=l s=l t=l s=l 

T T T T 

^ I E Ifi - ^Q^\Ws^t)\|Nf + I 5](i 5^ |(Hf.),||E(u;u,)|/Ar)l 



t=\ s=l t=l s=l 



^ downloadable from http : / /terpconnect .umd . edu / ~yuanliao /f actor2 / factor2 .html 
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By the Cauchy-Schwarz inequahty and that niax(<'r ^ \E{u'^Ut)/N\'^ = 0(1), 

T T T T 

Y,{^ \{l-HmE{<^t)\/Nf < max | Y.&-HQ^^ Y.^\En'M/Nr = 0,{^^ + 



t=l S = l ~ S = l S = l 



Also, |E*=i(?Es=il(Hf.).||i?Ku,)|/iV)2 < 0,(T-i)E,=i(^E.=il|f.lll^Ku,)|/iV)2. 



We have T"' YI.At ULi lltll |B(u>,)|/iV)2 = 0,(T-^-) since 



t=l s=l s=l 1=1 

< maxEllf.ir ma^(i ^ l^uSl/iV)^ = 0(T-^). 

This implies maxi<,T-i Tl=ii^t - Hft)^ = Op{N-^ + T-^), and 
thus T-i Ef=i lift - Hftf = Op(iV-i + 

(iii) E||w'Ef = ^Er=i(Eti^.«..t)' = rmax„- \EuuU,t\\Ml = 0{T). Thus 

||w'Ep = Op{T). Finally, (iv) follows from the Cauchy-Schwarz inequality and part (ii). 

□ 

Lemma C.2. max;,<i \^p{h) - ^f{h)\ = Op{^{L + log iV)/T + iV-V2). 

Proof. The triangular inequality implies max/i<i |7p(^) ~ 7/(^)1 ^ Ei=i where 



ai = max,<i \T-^ Y.ti i^"^^t+hf {^"B^tf - E{WBitf{WBi^ 



t+h) 

a2 = |(w'BB'w)2 - (w'BB'w)2|, 03 = w'BB'wmaXft<i |w'BB'w - T'^ '£j=ii'^'^'^i^'^^ 



t) h 



04 = w'BB'w max,,<i |w'BB'w - Y.^t=i {^'^^t+hf\ 
Here ai is bounded by an + ai2, where 

an = max,<i \T-^ Ef=?(w'Bfi+,)2(w'Bft)2 - E(w'Bfi)^(w'Bft+^)2|, and 
ai2 = max,<i \T-^ Ef=?(w'Bfi+,)2(w'Bf02 - (w'Bfi+,)2(w'Bfi)^|. 



As in the proof of Lemma B.l, an = 0.p{^^ LjT). It follows from the Cauchy-Schwarz 
inequality that 



T-h 

ai2 = Op(max|T-i V(w'Bfi+,)(w'Bfi)-(w'Bft+,)(w'Bfi)|) 

h<L ' 

t=l 

= Op(max(T-i 5^(w'Bf,+, - w'Bf,+,)')'/' + (^"' $^(w'Bfi - w'Bf,)')^/') 

t=i t=i 

T 

= Op(||w'(B - BH-^)|| + ||w'BH-i(T-i ||ft - Hfiin^/^) 
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that ai2 = Op{^logN/T + N~^/^), thus 



It follows from Lemma C.l 
ai = Op{^^{L + log N)/T + N^^l^). On the other hand, for ^i, ^5 defined in (|C.2|), 



as = Op(|w'(BB' - BBOwl) = 0^(5^ = OpUaj/T). 



i=l 



as 



T T 

0,(1) max |w'B(i S')B'w| = 0,(1 ^ ||f,f). 

t=T-h+l t=T-L+l 



We have ^ Z1^t-l+i 11?* IP ^ I ZILt-l+i - Hftp + | XlLr-L+i l|Hft|p. On one hand, 
^§^Ylt=T-L+i W^tW^ ~ 0{L/T). On the other hand, by Theorem 3.3 in Fan et al. (2013), 
maxt ||ft - Hffll = Op(l), hence | YI=t-l+i ~ HftP = Op{L/T). Thus 03 = Op{L/T). 
Similarly, we have = Op{L/T). 

□ 



Proof of Lemma 13.31 

We have \ap — ajl < J2^=i^i^ where bi = |7p(0) — 7/ 



h = 2ELiI7p(/^) - 
7/(/i)|, and 63 = 2 ^^^^ 7^ (/;,). Lemma C.2 implies 62 < 2Lmax/i<i |7p(/;,) — 7/(/i)| = 
Op{L^{L + \ogN)/T + L/y/N), which gives the convergence rate. The second statement is 
due to a? = Op(logA^). 



C.2 Proof of Theorem 3.3 



First, Sp = BB' + ft. With the identification condition cov(ft) = I^, E = BB' + E„. 
Therefore, if we write Ct = B - BR-^ and = T'^ Ef=i ftf* - cov(ft), then w'(Sp - 
= ELiS-i, where 
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gi = w'BDi-B'w, = w'CyB'w, (73 = w'BH ^C^w, 

1 ^ 

w'(f7 - Sjw, = w'BH-i- ^[fif; - Hfi(HfO']H-i'B'w. 



(C.2) 



Recall the definition of di in Appendix B.2, gi = di. Thus it follows from Lemma B.6 that 



T/cjgi — T-"^ A/'(0, 1). We proceed by showing that /a'jgi are asymptotically negligible 
for z = 2, 5. These results are given in the following lemmas. 

Lemma C.3. (i) \g2\ = Op{T-^/\w"Suwy/^ + A^-^/^ + T'^)^ 
(n)\gs\ = Op(r-V2(w'S.w)V4 + N'^' + T-'). 
M |^?5| = Op(iV-i/2 + T-i). 
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Proof. Using the facts that R = BF + E, B = RF'/T and FF'/T = Ik, we have 

B - BH"^ = BH-^(HF - F)F7T + E(F - UF)' /T + EF'H'/T. 
Thus g2 = 921 + 922 + 923, where 921 = w'BH-i(HF - F)F7TB'w, 

922 = w'E(F - HF)7TB'w, and 923 = w'EF'HVTB'w. It was shown by Fan et ah (2013) 

\92i\ < Op(l)||HF-F||||F||/T = 



that II HI 



IH 



Thus by Lemma 



C.l 



Opi^l/N + 1/T^). In addition, 1^722 1 < Op(yT) ||F-HF||/T = Op{^l/N + 1/T''). Finally, 
by Lemma |R21 l^asl < ||w'EF'||Op(l/T) = Op(T-i/2(^w'S„w)i/^). The proof for the conver- 
gence rate of \9s\ is the same, so is omitted. Finally, the rate of convergence for {9^1 follows 
from Lemma [C. II □ 



Proof of Theorem 13.31 



By Lemma C.3 and the assumption that Ta'j — i- 00, a^N/T — 00 and w'S^jW = o{a 
we have ^jT/aj9i = Op{l) for i = 2,3, 5. In addition, by Theorem 3.1 of Fan et al. (2013), 

||f!-E„||=0,M!^|^ + i)'/^-n 



which then implies that y'T/o"|(yf4 = Op(l), by the assumption 
||wf = oLf^fSjf^N^/^-''/^T-^/^). By Lemma 



B.6 



T/aj9i = ^jT/ajdi Af{0, 1), thus 
y/r/a|w'(Sp - S)w -^'^ A/'(0, 1). By the assumption L = Oi^/Na)) and Lemma 

a)). Since w'(Sp-S)w = Op{T-^^^J^), we have VT\W{%p-1:)w\\g]^ 



3.3 



ap^\ = Op(l). It follows that v/^7^w'(Sp-S)w A^(0, 1), which validates the H-CLUB. 



C.3 Proof of Theorem [3:41 

The theorem follows from the following lemma. 
Lemma C.4. Suppose it and Ut are independent, and Eft = Eut = 0, then 

T 

2 ^ w'Bftw'ut 



var 















x:(w'R,)^ 


= var 


E(w'Bf,)^ 


+ var 


Elw'u,)^ 


+ var 


-*=i 













t=l 



Proof. Since Rt = Bft + Ut, and and are independent. 



var 



^(w'R, 



t=i 



var 



Y^{Wmtf + {Wutf + 2w'Bftw'ut 



t=i 
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= var 



+2cov 



^(w'Bf, 



var 



+ var 



t=i 

' T 



.t=l 



_t=i t=i 
It suffices to sfiow tfie covariance term is zero. In fact, since E'w'Bfjw'ut = 0, 



2 w'Bfjw'ut 



cov 



J](w'Bf0^5]w'Bf,w'u, 



t=i 



t=i 



J2 cov[(w'Bf,)2,w'Bftw'ut] 



s<T,t<T 



= J2 £^(w'Bf,) VBfiw'ut = J2 ^(w'Bf,) 2w'Bft£;w'ut = 0. 

s,t s,t 

Finally, as Eft — implies E'w'Bfj = 0, we have 



COV 



T 



5^(w'u,)^J]w'Bf,w'u, 



: ^ cov[(w'u3)2,w'Bftw'ut] 

s<T,t<r 

^£;(w'u,)VBftw'ut = ^ £;(w'Bft)£;(w'u,) Vut = 0. 

s,t s,t 
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